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Abstract 

Background: The Molecular Interaction Map (MIM) notation offers a standard set of symbols and rules on their 
usage for the depiction of cellular signaling network diagrams. Such diagrams are essential for disseminating 
biological information in a concise manner. A lack of software tools for the notation restricts wider usage of the 
notation. Development of software is facilitated by a more detailed specification regarding software requirements 
than has previously existed for the MIM notation. 

Results: A formal implementation of the MIM notation was developed based on a core set of previously defined 
glyphs. This implementation provides a detailed specification of the properties of the elements of the MIM 
notation. Building upon this specification, a machine-readable format is provided as a standardized mechanism for 
the storage and exchange of MIM diagrams. This new format is accompanied by a Java-based application 
programming interface to help software developers to integrate MIM support into software projects. A validation 
mechanism is also provided to determine whether MIM datasets are in accordance with syntax rules provided by 
the new specification. 

Conclusions: The work presented here provides key foundational components to promote software development 
for the MIM notation. These components will speed up the development of interoperable tools supporting the 
MIM notation and will aid in the translation of data stored in MIM diagrams to other standardized formats. Several 
projects utilizing this implementation of the notation are outlined herein. The MIM specification is available as an 
additional file to this publication. Source code, libraries, documentation, and examples are available at http:// 
discover.nci.nih.gov/mim. 



Background 

Diagrams have long been used to organize knowledge, 
and there has been an ever-growing use of such dia- 
grams in biological sciences in the last half century. As 
their use increases so does the need for common meth- 
ods to communicate biological knowledge accurately 
from author to reader in a manner similar to other dis- 
ciplines that use technical drawings. Advances in mole- 
cular biology experimental techniques have resulted in 
an abundance of high-throughput data, placing addi- 
tional emphasis on the need for the organization and 
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visualization of biological data. Since its inception in 
1999, the Molecular Interaction Map (MIM) notation 
has helped address this need for standardized represen- 
tation of biochemical and cellular processes through a 
notation that shares visual characteristics with electrical 
circuit diagrams [1]. The notation has been featured in a 
variety of publications as the mechanism used to orga- 
nize biological information and the basis for mathemati- 
cal simulations [2-14]. The MIM notation has also 
garnered wide attention in the systems biology commu- 
nity. It has been advocated as a notation for graphical 
display of purely textual datasets such as those based on 
the Bio PAX ontology [15]. The notation has also helped 
spur the creation of the Systems Biology Graphical 
Notation (SBGN) consortium, which uses the MIM 
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notation as a basis for its SBGN Entity-Relationship 
(SBGN ER) language [16,17]. Potentially, the two graphi- 
cal notations may converge. 

As the amount of data in MIM diagrams increases, 
the availability of that data becomes a priority because 
the diagrams can serve as sources of data to be mined. 
The information content of MIM diagrams is often 
extended with annotations containing ancillary informa- 
tion, such as comments, external links, and citations; 
annotations are denoted as labels on interaction lines 
[1]. Annotations provide readers with additional insight 
into the systems they represent, which may not be cap- 
tured by the MIM glyphs per se (e.g. information 
regarding time, location, and sequence of events). These 
annotations have recently been mined in the work by 
Mcintosh and Curran who built a MIM corpus that 
maps MIM annotations to passages from the original 
research articles [18]. 

The present work outlines an implementation of the 
MIM notation and provides a new specification with a 
series of software tools based on this specification. Pre- 
sented first is the new specification that addresses pre- 
vious ambiguities in the notation, provides definitions as 
a foundation for translation, and establishes a set of syn- 
tax rules for the validation of MIM diagrams; the speci- 
fication is provided as Additional file 1. This 
specification forms the basis of an XML-based format 
that includes elements to capture both the graphical and 
non-graphical elements of MIM diagrams. The next 
topic presented is a mechanism for validating MIM 
datasets according to the syntax rules found in the spe- 
cification. Lastly, an application programming interface 
(API) is provided as a support mechanism for develo- 
pers to interact with specific features of the MIM 
format. 

Methods and Results 

Description of the Formal MIM Specification 

The MIM notation has been described previously in sev- 
eral publications [1,19-21]. The formal implementation 
presented below is based on the most widely used fea- 
tures in the MIM notation as presented in 2006 [21], 
and retains the goal of the MIM notation to present 
unambiguous and accurate diagrams of biological sys- 
tems, while simplifying the visualization of these dia- 
grams. The complete MIM specification is provided as 
Additional file 1 and online on our project homepage 
(http:/ / discover.nci.nih.gov/ mim) . 

MIM Notation Elements 

MIM diagrams represent bioregulatory networks invol- 
ving graphical elements broadly divided into two cate- 
gories: entity and interaction glyphs. Entities may 
represent objects in nature with physical structures, 



such as biological molecules (e.g. protein, DNA, RNA, 
etc), or non-physical objects, such as phenotypes, beha- 
viors, perturbations, cell cycle states, etc. Interactions 
are relationships between an entity and other entities or 
interactions. Interactions between two entities are repre- 
sented in the form of binding interactions (e.g. the bind- 
ing of calcium to calmodulin) or the transformation of 
one entity into another (e.g. the ADP and phosphate to 
ATP). An interaction between an entity and another 
interaction can be used to describe the influence the 
entity exerts on the interactions, such as the inhibition 
of binding between two proteins by a third. The basic 
graphical elements that represent MIM elements are 
referred to as glyphs and are shown in Figure 1 and full 
details on their usage are provided in Additional file 1. 

Entity glyphs are differentiated by their shapes, and 
the various types of interactions are represented by lines 
with different arrowheads or other line end-marks. For 
some glyphs, the semantics of the glyph are determined 
by the context in which the glyph is used; examples of 
these cases include the production without loss and sti- 
mulation glyphs. The simple physical entity and entity 
feature glyphs, and restricted copy entity glyphs shown 
in Figure 1, while explicit and implicit complex forma- 
tion are shown in Figure 2. As has been the case with 
previous MIM specifications, the color of a glyph does 
not affect its semantics. The accompanying MIM speci- 
fication describes the full set of MIM glyphs shown in 
Figures 1 and 2 along with their appropriate use. 

Entity Glyphs 

The MIM notation supports various types of entities, 
each represented by a different glyph. The most com- 
mon entity glyph is a labeled rounded-rectangle that 
represents a simple physical entity (SPEs) where "sim- 
ple" denotes that the entity is not in a complex with 
other entities; this glyph is used to represent molecules 
such as proteins, DNA, RNA, etc. The labels of SPEs are 
the main identifiers used by readers in understanding 
MIM diagrams, and it would be helpful for the purpose 
of data exchange that standardized nomenclatures, such 
as HGNC names for genes, be used. 

An SPE can also be represented in a manner that 
defines specific regions of molecules, such as protein 
domains, motifs, and sites. This representation provides 
a more detailed description of an SPE through the use 
of entity features. Entity features can be used to indicate 
specific regions of an SPE that carry out a particular 
function as illustrated in Figure 2. 

SPEs are typically represented only once in a given 
diagram, which allows for the traceability of all the 
interactions of a given entity to a single location on a 
diagram. Modifiers are physical entities that are repre- 
sented as labels without borders and generally represent 
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Entity Glyphs 



Reaction Glyphs 



LABEL 



LABEL 



LABEL 



0 



Conceptual entity 

Simple physical 
entity 

Entity feature 
Source/Sink 



LABELPOSITION Modifier 



Restricted copy 



Non-covalent reversible binding 



c 
► 



Covalent modification 
Covalent irreversible binding 
Stochiometric conversion 

Production without loss 
Template reaction 



Catalytic Interaction Glyphs 



Contingency Glyphs 



-i / Covalent bond 
cleavage 



-Q Catalysis 



^> Stimulation 

> 



Necessary 
stimulation 

Inhibition 

Absolute 
Inhibition 



Figure 1 Basic glyphs used in this implementation of the MIM notation. The MIM notation consists essentially of glyphs representing 
entities and interactions; shown are the independent glyphs. A description of the usage for all the glyphs in the MIM notation is provided in 
Additional file 1. 



small molecules (e.g. phosphate, methyl, or ubiquitin 
molecules) that can be depicted multiple times in a dia- 
gram. Such small molecules tend to exist frequently in 
pathway interactions, and it would be prohibitive to 
route all the connections of a small molecule to a single 
glyph. 

SPEs can exist in complexes with other entities. There 
are two types of complexes in MIM: explicit and impli- 
cit complexes as shown in Figure 2. Explicit complexes 
are diagrammed as small filled circles on binding inter- 
action lines, and are termed "explicit" because the bind- 
ing partners that give rise to them are shown. Implicit 
complexes, however, are diagrammed as enclosures of 
SPEs and indicate an implied relationship between the 
SPEs without showing their direct interactions. Figure 3 
shows a complex of SPEs A, B, and C as an explicit 



complex. The interaction between A and B forms the 
complex A:B, which is then bound to C. As represented 
as an explicit complex, A and B cannot be unbound if C 
is bound to the A:B complex; this represents a common 
mechanism whereby C stabilizes the A:B interaction. 
The notation implies that SPEs A and B must bind to 
each other before C can bind to them. If A and B 
unbind, then there would no longer be an A:B for C to 
bind to. Therefore, SPE C must dissociate first before A 
and B can dissociate. The implicit complex SPEs shown 
in Figure 2 does not provide readers of the MIM dia- 
gram with information on the binding order of entities 
unlike the explicit complex representation. This repre- 
sentation is useful when binding order is unknown or 
when those details are not relevant to the intent of the 
diagram. 
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Complex Formation 



Implicit complex 




A W- 




*4 B 



Explicit complex 



Simple Physical Entity Features 



Branching 



Label 



Feature 1 



StartFeature 2:End 



Branching glyph 



Additional Representations 




A <- 



Dimer formation 
using restricted copy 

Intermodular interaction 



State combination 



Figure 2 More advanced use cases of MIM glyphs. Examples of cases involving multiple MIM glyphs to represent a concept. 



Since SPEs tend to appear once on a diagram, it 
becomes necessary to have a method for diagramming 
homo-dimerization interactions. Such interactions make 
use of restricted copy entities, small black dots that act 
as copies of the SPEs to which they are bound. 
Restricted copy entities cannot be included in any other 
interaction; the complexes resulting from homo- 



Figure 3 Example of a trimer between simple physical entities 

A, B, and C. The explicit complex node on the binding interaction 
line between entities A and B denotes the dimer A:B and the 
binding interaction between this complex and C denotes the trimer 
A:B:C. 



dimerization interactions can participate in interactions 
in the same way as other explicit complexes. 

There are two additional MIM entity glyphs: concep- 
tual entity and source/sink glyphs. Conceptual entities, 
shown by a rectangular glyph, can be used to represent 
objects that do not have a clear physical structure (e.g. 
ionizing radiation) or whose physical structure is outside 
the scope of the given diagram. The source/sink symbol 
uses the mathematical symbol for an empty set and can 
be used to represent an unlimited and unspecified 
source for the production of an entity or an unspecified 
product of a degradation reaction. 

Interaction Glyphs 

The notation has three categories of interactions that 
exist between entities: reactions (i.e. an interaction 
where the input and output are both entities), catalytic 
interactions (i.e. an abbreviation for a known set of reac- 
tions), and contingencies (i.e. an interaction in which a 
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controller entity regulates, modifies, or otherwise influ- 
ences another reaction or contingency); all the interac- 
tion glyphs are shown in Figure 1. For brevity, only the 
usage of a limited number of interactions is presented 
here. Readers are directed to the MIM specification for 
a description of all interaction types. The reaction types 
include non-covalent reversible binding, covalent modi- 
fication, covalent irreversible binding, template reaction, 
stoichiometric conversion, and production without loss 
of reactants. 

Non-covalent reversible binding is represented as a 
double-headed line with barbed arrowhead endings con- 
necting two entities; barbed arrowheads are permissible 
only in accordance with the specification rules outlined 
in Table three of the MIM specification. The outcome of 
an interaction represented by a non-covalent reversible 
binding glyph is an explicit complex, indicated by a filled 
circle on the interaction line; the resulting explicit com- 
plex can then participate in additional valid interactions. 

Covalent irreversible binding functions in the same 
manner as non-covalent reversible binding, but with dif- 
ferent semantics, namely that those interactions must be 
covalent and not directly reversible. Covalent modifica- 
tion uses a single-headed line with a barbed ending 
pointing towards the modified entity with a modifier 
entity on the end of the line that is not barbed, as 
shown in Figure 4. 

Stoichiometric conversion uses a single-headed line 
with a triangle end that points from the reactant to the 
product. The stoichiometric conversion can be used to 
describe the production of multiple entities, as shown in 
Figure 5. 

The MIM notation also offers glyphs to describe the 
stimulation and inhibition of other interactions, and 
these are part of the set of contingency interactions. An 
unfilled triangle arrowhead is used to represent stimula- 
tion, while a terminal bar is used for inhibition. To indi- 
cate that an entity is necessary for a process to occur, a 
bar is placed behind the unfilled triangle of the stimula- 
tion glyph. 

Catalytic interactions follow a similar set of syntactic 
rules as contingencies, but represent interactions requir- 
ing a catalyst (e.g. the enzymatic activity of a kinase or 
phosphatase). 

The MIM specification provides additional examples 
and guidance on the usage of the glyphs in the notation, 
especially in more complicated situations. 



Figure 4 Formation of the phosphorylated entity A Explicit 
complexes placed on the covalent modification lines represent the 
modified entity. 



Figure 5 Conversion of one entity into multiple entities. The 

stoichiometric conversion of entity A to entities B and C. 



Formal Rules for MIM 

One of the key features of this new implementation of 
the MIM notation is the introduction of a strict set of 
syntax rules that provide constraints on valid MIM dia- 
grams. Previous MIM publications presented both basic 
and elaborate examples, but did not codify the extent of 
the syntactic capabilities of the elements of the notation. 
There was no clear method to validate diagrams in 
terms of the manner in which elements are connected. 
The new syntax rules presented here will help users cre- 
ate and update valid MIM diagrams. 

The syntax rules presented in Table three and Table 
four in Section 8.1 of the MIM specification treat each 
interaction glyph as having three possible places of con- 
nection: the start, the end, and the line itself. For sym- 
metric interactions (e.g. non-covalent reversible binding 
and covalent irreversible binding), either terminus of the 
line may be considered the start of the interaction with 
respect to the syntax rules. For all other interactions 
types, the line terminus without an arrowhead is consid- 
ered the start of the interaction. The syntax rules outline 
what entities may connect to the start and end of an 
interaction line and whether a symbol can exist on an 
interaction line between its termini. Additional rules in 
the formal MIM specification outline the usage of 
branching glyphs, as well as other syntactic rules. 

In addition to these rules, the MIM specification out- 
lines how glyphs should be interpreted in conjunction 
with other glyphs. Section 8.4 of the MIM specification 
specifically outlines correct interpretations of the pre- 
sence of entities given the potential incompleteness of 
knowledge about a particular entity based on the inter- 
actions in a diagram. This is closely aligned to the ideas 
of the "heuristic" MIM interpretation, which recognizes 
that the role of "transitive" effects (the effect of a given 
interaction on others) is often unknown [21,22]. 

Limitations of the Formal Implementation 

The MIM notation formalized by the current specifica- 
tion is constrained to facilitate the implementation of 
software tools for MIM, and does not include all MIM 
glyphs that have been published previously. In the nota- 
tion's history there have been variations in the represen- 
tation of certain glyphs; this issue has been addressed by 
choosing a single representation for each concept in the 
notation. This implementation of the notation does not 
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permit ad hoc glyph creation. This facilitates the valida- 
tion of diagrams and allows software developers to be 
confident that they have implemented all known fea- 
tures of the implementation; other glyphs may be added 
in future releases of the specification. 

As the notation has evolved, several shorthand nota- 
tion elements have been developed to simplify common 
patterns in MIM diagrams. The current implementation 
includes only some of these shorthand notation ele- 
ments from previous MIM publications. Undoubtedly, 
this means that certain constructs in diagrams will be 
more visually complex, but since these elements do not 
add to the semantic capacity, they have been postponed 
for future releases. 

The specification presented here does not provide 
strict guidance on the glyphs of the notation that are 
suitable for computational simulations; this topic has 
been discussed previously regarding the manner in 
which the MIM notation may be used in conjunction 
with mathematical simulations [2,21]. One future devel- 
opment that may support such a goal is by outlining an 
additional validation method for glyphs appropriate in 
computational simulations. 

Lastly, this implementation does not specify a notation 
to represent transport interactions. Entities in the MIM 
notation may represent a given entity in multiple states 
(e.g. a protein with some molecules in the nucleus and 
others in cytoplasm). With no clear way of distinguish- 
ing these states, the semantics of a transport interaction 
would be unclear; this is a problem common to other 
similar notations such as the SBGN ER Level 1 Version 
1.2 notation [16]. In previous MIM publications, trans- 
port reactions have been represented by a stoichiometric 
conversion glyph. In many cases this representation is 
clear and unambiguous. That representation, however, 
sometimes introduces awkward ambiguities. Therefore, 
we did not include transport reactions in the current 
version of the formal MIM specifications. 

Limitations of Current Machine-Readable Representations 
of MIM Diagrams 

Several software projects have included support for the 
notation and have each addressed the lack of a standar- 
dized data model for the MIM notation differently. The 
first associated software project for MIM diagrams inte- 
grated the diagrams with metadata in the form of E- 
MIMs (electronic MIMs) found at http://discover.nci. 
nih.gov/mim[21]. E-MIMs store the diagrams using the 
SVG (Scalable Vector Graphics) format to provide inter- 
active features allowing the graphic elements to be con- 
nected to metadata. The SVG format does not retain 
the semantics of the elements visualized, and the meta- 
data, currently, are reintroduced to the SVG files 
through a post-processing step. 



Another project, the Java-based biological pathway 
diagram editor PathVisio, has included the glyphs of the 
notation for the purpose of facilitating the production of 
MIM diagrams [23]. The PathVisio software provides 
MIM-specific interaction glyphs for diagrams that are 
stored using the GPML (GenMAPP Pathway Markup 
Language) format. The MIM-specific glyphs were pro- 
vided as additions to the pre-existing PathVisio glyphs 
that are external to the MIM notation. If users include 
both MIM-specific and external glyphs, a diagram will 
be viable in the context of the GPML format, but will 
lose the consistency required for exchange with other 
MIM-specific tools and tasks, such as validation. 

Software support for MIM was recently provided by 
the MIMCITY database project (http://www.mimcity. 
org) for storing, querying, visualizing and analyzing data 
contained in MIM diagrams [Karac, et al., in prepara- 
tion]. The project developed an accompanying data 
model implemented in the form of a database schema 
that is compatible with an SBML-based representation 
of the MIM notation also developed for that project. 
The SBML-based MIM representation addresses incom- 
patibilities between the MIM and SBML through the 
use of SBML annotation containers to embed MIM 
information content not supported in SBML. The MIM- 
CITY database schema and the SBML-based MIM 
representation, however, do not include elements to 
describe the visualization of MIM diagrams. 

Overview of MIM Schema 

The various formats used in the aforementioned projects 
have limitations that highlight the need for a standard 
format to support future MIM software projects. The 
new MIM Markup Language (MIMML) meets this need 
and provides a standard format for the exchange of MIM 
diagrams among different software. The MIMML format 
conforms to XML Schema 1.0, and is based on the 
GPML schema developed for use with PathVisio [23,24]. 
MIM datasets are plain-text XML data streams: the data- 
sets are characterized by matching start and end tags, 
and elements can contain attribute-value pairings. The 
schema is used to store information about the visual pre- 
sentation (e.g. positioning, color, or size) and layout of 
the diagram as well as accompanying metadata (relation- 
ships to external databases, annotations, and citations). 

The MIMML format employs several XML elements 
of different types; the ones of most importance are out- 
lined here. The root element of the MIMML schema is 
the Diagram XML element used primarily to store size 
information about the diagram. This element can have 
several types of child elements; primarily these include: 
EntityGlyph, InteractionGlyph, Anchor and MimBio 
XML elements; Figure 6 shows a small MIMML dataset 
that includes examples of all the MIMML elements 
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<?xml version="1.0" encoding="UTF-8"?> 

<mimVis:Diagram width-"! 58.4" height-"! 46.3" xmlns:mimVis="http://lmp. nci.nih.gov/mim/mimVisLevel1"> 
<mimVis:EntityGlyph visld="a6fc4" centerX="47.0" centerY="123.0" width="60.0" height="20.0" 

color="000000"type="SimplePhysicalEntity"displayName="CaMK"> 
</mimVis:EntityGlyph> 

<mimVis:EntityGlyph visld="e839e" centerX="47.0" centerY="38.0" width="20.0" height="20.0" 

color="000000" type="Modifier" displayName="P"> 
</mimVis:EntityGlyph> 

<mimVis:EntityGlyph visld="a677d" centerX="114.0" centerY="74.0" width="60.0" height="20.0" 

color="000000"type="SimplePhysicalEntity"displayName="Phtase"> 
</mimVis:EntityGlyph> 

<mimVis:lnteractionGlyph visld="id4f7e9fd7" color="000000"> 

<mimVis:Point x="47.0" y="48.0" arrowHead="Line" visRef="e839e" relX="0.0" relY="1.07> 

<mimVis:Point x="47.0" y="113.0" arrowHead="CovalentModification" visRef="a6fc4" relX="0.0" relY="-1.07> 

<mimVis:anchorRef>bd4fd</mimVis:anchorRef> 

<mimVis:mimBioRef>b8e</rnirnVis:rnirnBioRef> 
</mimVis: lnteractionGlyph> 

<mimVis:lnteractionGlyph visld="id821 85491" color="000000"> 
<mimVis:Point x="47.0" y="74.0" arrowHead="CovalentBondCleavage" visRef="bd4fd" relX="0.0" relY="0.0"/> 
<mimVis:Point x="84.0" y="74.0" arrowHead="Line" visRef="a677d" relX="-1.0" relY="0.0"/> 

</mimVis: lnteractionGlyph> 

<mimVis:Anchor visld="bd4fd" position="0.4" type="lnvisible"/> 
<mimVis:MimBio> 
<mimVis:title>CaMK Phosphorylation</mimVis:title> 
<mimVis:identifier>2010111 7</mimVis:identifier> 
<mimVis:PublicationXRefvisld="b8e"> 
<mimVis:db>PubMed</mimVis:db> 
<mimVis:id>16267266</mimVis:id> 

<mimVis:title>Molecular interaction maps of bioregulatory networks: 

a general rubric for systems biology. </mimVis:title> 
<mimVis:author>Kohn KW</mimVis:author> 
<mimVis:author>Aladjem MI</mimVis:author> 
<mimVis:author>Weinstein JN</mimVis:author> 
<mimVis:author>Pommier Y</mimVis:author> 
<mimVis:year>2006</mimVis:year> 
<mimVis:journal>Mol Biol Cell</mimVis:journal> 
</mimVis:PublicationXRef> 
</mimVis:MimBio> 
</mimVis:Diagram> 

Figure 6 Example MIMML dataset that describes the interaction of the cleavage of phosphate from CaMK Blue: Highlighted section 
describes graphical elements of the MIM notation. Red: Highlighted section describes metadata components of the MIM notation including 
diagram title and identifier and a publication cross-reference. 



described; other example files on the website have more 
comprehensive examples (http://discover.nci.nih.gov/ 
mim). EntityGlyph is used to store information about all 
MIM entities. This is a departure from GPML, which 
stores information in implicit and explicit complexes as 



groups and anchors, respectively. This change allows for 
a uniform mechanism when validating MIM entity 
glyphs. InteractionGlyphs are used to store information 
about interactions and the elements to which they are 
connected. An InteractionGlyph XML element is made 
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up of several Point elements. Point elements provide 
routing information and store type of arrowheads used 
in a given interaction as shown in Figure 6. For the pur- 
poses of validation, the MIMML schema, unlike GPML, 
restricts the types of arrowheads that can be used to 
only those that exist in the MIM notation. The first and 
last Points of an InteractionGlyph contain the visRef 
attribute to specify to which MIM element each line 
end is connected; shown in Figure 6. InteractionGlyphs 
may also include attributes pointing to Anchor XML 
elements. Anchors are connection points on an Interac- 
tionGlyph. These are used to connect interactions to 
each other, as is the case with contingencies, and also, 
to represent the intramolecular glyph on an interaction. 

The XML elements for interactions and entities can 
also include references to particular metadata items 
stored in the MIMBio XML element that acts as the pri- 
mary location for the storage of MIM metadata. The 
MIMML format supports two types of metadata: cross- 
references and annotations. Cross-references allow the 
mapping of external database resources to MIM ele- 
ments. Annotations exist as two components: a com- 
ment and a publication cross-reference. This allows 
users to map particular interactions to the publications 
that provide evidence for the existence of the interac- 
tion, while the comments stored for specific MIM ele- 
ments provide additional information about entities and 
interactions not captured by the notation. The structure 
of MIM cross-references is modeled after those in the 
BioPAX format [25]. The MimBio XML element also 
stores metadata related to the diagram, such as title and 
creator information; these elements are similar to those 
provided in GPML with the exception that they use 
terms from the Dublin Core set of metadata terms 
(http://dublincore.org/). 

The MIMML schema adds metadata elements to allow 
controlled vocabulary to be used to describe the 



relationship of an external database resource and a MIM 
element and to allow users to specify the biological prop- 
erties of entities through controlled vocabulary beyond 
the generic terminology used by the MIM notation (e.g. a 
simple physical entity can be described as a protein, 
DNA, RNA, etc). The values for element type were 
adopted from the BioPAX format to simplify the process 
of translating MIM diagrams to BioPAX datasets. 

An Example Diagram and MIMML Dataset 

In this section, we provide an example of the MIM 
notation and MIMML format with the use of the Ca2 
+/calmodulin-dependent protein kinase (CaMK) MIM 
diagram (shown in Figure 7) stored in the MIMML for- 
mat and provided as supplemental information. The 
CaMK regulation MIM was originally introduced in the 
2006 specification of the MIM notation as Figure twelve 
of that publication; a full description of the interactions 
is included by Kohn et al [21]. The diagram covers 
many of the properties of the MIM notation, thereby 
making it useful when describing the changes that have 
been made to the MIM notation. The CaMK example 
shows the intramolecular control of the protein kinase 
CaMK, and how this regulation can affect the phosphor- 
ylation of substrates (labeled here as "Substrates") by the 
kinase domain of CaMK. Figure 7 shows the diagram 
according to the formal MIM presented here. The most 
significant visual change is in the CaMK protein glyph. 
The entity glyph for the CaMK protein is linked to its 
two domains (the kinase and regulation domains), and 
the domains have been separated so they exist in two 
separate entity feature glyphs. The cleavage and intra- 
molecular glyphs have undergone cosmetic changes, as 
well as the way that branched interactions are sup- 
ported. Changes to the diagram are largely cosmetic to 
simplify the implementation of the notation in software 
editors of MIM diagrams. 
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Figure 7 CaMK MIM diagram based on the specification outlined in the current publication 



Luna et al. BMC Bioinformatics 201 1, 12:167 
http://www.biomedcentral.eom/1 471 -2 1 05/1 2/1 67 



Page 9 of 1 3 



Validation of MIM Datasets 

Datasets for each schema are validated at two levels: the 
first is with regard to the well-formedness of the dataset 
according the MIMML XML schema and the second 
are rules that are outlined in the formal MIM specifica- 
tion. The MIMML XML schema outlines the valid 
structure of a MIMML dataset, which can be used for 
validation purposes. The second level of rules checks 
proper usage of several properties of MIMML datasets: 
use of entity/interaction attributes, formats of labels for 
modifiers and entity features, use of interaction arrow- 
heads, placement of explicit complex and intramolecular 
symbols, and connection of interactions to entities or 
other interactions. Validation of MIMML datasets 
against the formal MIM connection rules is done using 
Schematron, a rule-based validation language for finding 
patterns in XML trees [26]. Assertions about the pre- 
sence or absence of these patterns can be used to deter- 
mine that a document adheres to a given rule set. 
Currently, MIMML datasets are not being validated 
against the layout rules and recommendations found in 
the formal MIM specification; the focus here is to vali- 
date the syntax of MIM diagrams. 

Figure 8 illustrates a rule for a MIMML dataset that 
determines whether explicit complexes in a given MIM 
diagram were placed on the correct types of interactions; 



these interaction types include: covalent modification, 
non-covalent reversible binding, covalent irreversible 
binding, and state combinations. Results from validation 
are returned in the Schematron Validation Report Lan- 
guage (SVRL), a simple report XML-based language 
(http://www.schematron.com/validators.html). The 
results provide the name of the rule fired, the elements 
tested, the rule tested, the location of elements failing the 
test using an XPath expression, and diagnostic informa- 
tion relevant to invalid elements. 

The MIM Schematron rule set can be used wherever 
Extensible Stylesheet Language Transformations (XSLT) 
may be used with other standard XML tools. To sim- 
plify the use of the Schematron rule set, it is made avail- 
able in conjunction with Java-based Schematron Ant 
Task (http://code.google.eom/p/schematron/) along with 
a Java build file to show how Schematron may be used 
as a part of a pipeline and for the batch validation of 
multiple MIMML datasets. 

MIM Application Programming Interface (API) 
Implementation 

Usage of an XML-based format to store the data of MIM 
diagrams allows developers to provide MIMML-related 
functionality using commonly available libraries capable of 
parsing XML data streams, but these libraries work at a 



<!-- Validate Explicit Complex Glyph Placement --> 

<iso:pattern name="check-ec-placement" id="check-ec-placement"> 

<iso:rule context="mimVis:EntityGlyph[@type='ExplicitComplex']"> 



<iso:let name="vis-id" value="@visld"/> 

<iso:let name="inter" value="//mimVis:lnteractionGlyph[mimVis:anchorRef=$vis-id]"/> 
<iso:assert test-' 

$inter[mimVis:Point[@arrowHead='CovalentModification']] or 
$inter[mimVis:Point[@arrowHead='NonCovalentReversibleBinding']] or 
$inter[mimVis:Point[@arrowHead='CovalentlrreversibleBinding']] or 
$inter[mimVis:Point[@arrowHead='StateCombination']] 

diagnostics="vis-id inter-vis-id inter-start-arrowhead inter-end-arrowhead"> 
Explicit complexes should only be placed on the following interaction types: covalent modification, 
non-covalent reversible binding, covalent irreversible binding, or state combination. </iso:assert> 
</iso:rule> 
</iso:pattern> 

Figure 8 A Schematron-formatted rule validating the placement of explicit complex glyph within a MIMML dataset Blue: The context 
to which the set of rule assertions refer; in this case, entity glyphs of type "explicit complex". Red: Assertions related to the context XML 
element. In this case, an explicit complex may only exist on interactions of the following types: covalent modification, non-covalent reversible 
binding, covalent irreversible binding, and state combination. Green: A set of diagnostic entries to be displayed, if the test assertion fails, 
including the ID of the explicit complex glyph (vis-id), the ID of the interaction on which the explicit complex appears (inter-vis-id), and the 
types of arrowheads at the start and end of the interaction glyph (inter-start-arrowhead and inter-end-arrowhead). 
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low-level, at the level of XML elements and attributes. The 
MIM API provides a higher-level of functionality to inter- 
act directly with features of the MIMML schema. 

The MIM API is a Java-based API to the elements and 
attributes for the manipulation and retrieval of informa- 
tion contained in a MIM diagram set forth by the 
MIMML XML schema. The interface is generated using 
XMLBeans (http://xmlbeans.apache.org/) a Java-to-XML 
binding framework used for developing Java applications 
built around an XML schema. The framework provides 
wide coverage of the features available for XML Sche- 
mas and maps XML data types to Java data types. 
XMLBeans generates a set of corresponding Java classes 
based on an input XML Schema. These generated inter- 
faces and classes can then be used by developers to 
access and manipulate XML instance data using Java- 
Beans-style accessors (e.g. getFoo() and setFoo()), which 
are more friendly than usage the of the XML Document 
Object Model (DOM). XMLbeans provides an XML 
parser and validator, and it gives developers the capabil- 
ity of lower-level navigation of an XML document using 
XMLCursor. A complete description of XMLBeans is 
available at (http://xmlbeans.apache.org/documentation/ 
index.html). The MIM API requires the installation of 
the underlying XMLBeans library (http://xmlbeans. 
apache.org). The XMLBeans library provides support for 
XPath and XQuery expressions using the Saxon XSLT 
and XQuery processor (http://saxon.sourceforge.net). 

Usage of the MIM API 

Operations using the API are aligned to the MIM XML 
schema, and one Java object corresponds to each element 
in the MIMML document. All of the interfaces to the 
MIM elements inherit from the XMLObject interface, 
provided by XMLBeans. This interface provides basic 
functionality for all objects, such as the method for vali- 
dation against the XML schema. Usage of the XMLBeans 
library provides the capability of inputting MIMML files 
in a variety of ways including from a file or string, but 
the MIM API is also capable of importing from a Java 
XML DOM (Document Object Model) Node object or 
by retrieving a MIMML data stream using a URL (Uni- 
form Resource Locator). MIMML datasets can also be 
created de novo and existing datasets can be manipulated. 
The MIM API supports all the constructs of the MIMML 
format including the ancillary constructs, such as com- 
ments, and generic properties. Using a Java-based XSLT 
processing engine, such as Saxon, it is possible to addi- 
tionally validate MIMML datasets against the Schema- 
tron rule set within a Java program. 

Benefits and Disadvantages of XMLBeans for the MIM API 

The MIM API can be used in conjunction with APIs for 
other formats or libraries providing other functionality, 



but the major distinction between the SBML and 
CellML APIs and the MIM API is the usage of an 
XML-binding framework, which comes with both bene- 
fits and disadvantages. One benefit is that the usage of 
XMLBeans has simplified and sped up the development 
of a MIM API. This has allowed the MIM API to reach 
its intended audience faster, which allows developers to 
concentrate on developing applications that support the 
MIM notation and make use of the information content 
represented by the diagrams rather through manual gen- 
eration of boilerplate code. Secondly, bugs are mini- 
mized due to the stability of the XMLBeans code base 
resulting from over five years of development and use. 
Lastly, maintainability is improved through code genera- 
tion that aids in the adaption of software to future 
changes made to the underlying MIM schemas. One 
disadvantage in using XMLBeans is that it provides 
functionality for a single programming language, which 
may be a deterrent to some developers. Both SBML and 
CellML provide several language bindings [27,28]. As 
the need for the support of other languages increases, 
other XML-binding libraries will be used to make these 
language bindings available. 

Discussion and Conclusions 

The Molecular Interaction Map (MIM) notation pro- 
vides a way to depict bioregulatory network diagrams in 
a standardized manner. The notation was originally 
developed in 1999 and has since been further developed 
and updated, most notably in 2006 [1,21], where a 
detailed description of the glyphs and their usage was 
provided. The MIM notation has been developed in a 
fluid manner that has allowed it to depict a wide range 
of biological concepts including notation for polymerase, 
helicase, and primase activity as well as other symbols 
[3]. These fluid advancements in the notation enhance 
the range of biological networks that can be dia- 
grammed, but can hamper the development of consis- 
tent software for the MIM notation. Here we present a 
well-defined and internally consistent MIM formalism 
and set of tools that can facilitate the development of 
software supporting the major parts of the MIM nota- 
tion in the areas of creation, validation, and analysis of 
MIM diagrams. These tools should also facilitate in 
translation of MIMML-formatted diagrams to and from 
other formats, such as the BioPAX or the Systems Biol- 
ogy Graphical Notation Markup Language (SBGN-ML) 
format currently in development (http://libsbgn.sf.net/). 
SBGN-ML and other developments by the libSBGN 
group will provide developers with tools similar to the 
ones we present here for MIM, so that pathway editors, 
such as CellDesigner, Edinburgh Pathway Editor, Path- 
Visio, and VANTED, can support SBGN in a common 
manner [23,29-31]. 
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A MIM diagram editor, MIMTool, has been developed, 
which supports the MIMML format presented herein 
(http://code.google.eom/p/mimtool/) [Edes, et al., in pre- 
paration] . It is limited in that it does not yet support the 
metadata components of the format. MIMTool is asso- 
ciated with the MIMCITY database for MIM diagrams, 
which is expected to also support the MIMML format in 
the future (http://www.mimcity.org) [23]. Additionally, 
PathVisio (http://www.pathvisio.org) is in the process of 
being extended to support the MIM specification and the 
MIMML schema; one key feature being added to PathVisio 
is the ability to render MIMML files (http://discover.nci. 
nih.gov/mim/). The example figures associated with the 
current publication have been produced in PathVisio [32]. 

The current work places a major emphasis in provid- 
ing developers with basic tools to facilitate software 
development and enhances the level of detail for the 
presentation of MIM concepts. It is hoped that this new 
level of detail simplifies the adoption of additional MIM 
concepts into the SBGN notations or other notations. 
As one example, MIMs can represent protein domains 
as entity features, and this capability is important for 
the depiction of many critical biological signalling path- 
ways. Interactions involving domains can therefore be 
represented with greater flexibility using MIMs than 
with SBGN ER Level 1 Version 1.2, which currently 
does not address domain representation [16]. For an 
example of how SBGN addresses interactions involving 
the domains represented in MIMs, the reader might 
compare Figure 7 for CAMK regulation in the current 
paper with Figure 2.1 in the SBGN ER specification. 
While graphical notations in biology have received 
strong attention in recent years, no notation has yet met 
all the needs of users. One of the most recent MIM 
publications [3] outlines the depiction of several new 
glyphs for polymerase, helicase, and primase activity, 
which helps to further the discussion on use cases still 
requiring a standardized depiction. 

Further developments may add new components to the 
specification and the MIMML schema as the usage of 
these components of the MIM notation becomes clari- 
fied. Additionally, the combinatorial interpretation mode 
of MIM diagrams [22] is in the process of being algorith- 
mically defined and will be supported in future software. 

The work presented here makes advances in the usage of 
the MIM notation to visualize data in a way that is more 
"natural" to humans while retaining the qualities of being 
consistent and machine-readable. Participation by software 
developers within our group and collaborators has helped 
to ensure that all elements have a straightforward imple- 
mentation. This implementation of the MIM notation will 
continue to expand to cover more of the glyphs outlined in 
publications on the MIM notation; each acting as a basis 
for the development of MIM software support. 



Availability and Ongoing Support for the MIM 
Specification and Software 

The schemas and API are free and open source projects 
under the Apache License 2.0 that allows users to freely 
copy, distribute, and modify the projects and the underly- 
ing source code; this software may also be used in proprie- 
tary software. All project files are stored in our SVN 
repository and links to specific files, such as the MIMML 
XML Schema, are provided from the project homepage. 
The MIMML XML Schema is provided with documenta- 
tion in the form of a webpage outlining the various XML 
elements and their attributes. Sample MIMML datasets 
are provided along with a Java Ant build file, which incor- 
porates the Schematron Ant Task to validate the samples 
according to the MIM validation rules; this provides a 
mechanism to enhance the quality of MIM diagrams. This 
is a stable API for the MIMML format meant for wide- 
spread use. Documentation of the attributes and opera- 
tions used in the API is provided using Javadoc (http:// 
java.sun.com/j2se/javadoc/) on the project's SVN reposi- 
tory. The projects may be updated regularly to support 
new features, and contributions are welcome. 

Availability and requirements 

• Project name: MIM Specification, API and Valida- 
tion Rule Set 

• Project home page: http://discover.nci.nih.gov/mim; 
SVN repository: https://ncisvn.nci.nih.gov/svn/mim 

• Operating system(s): Platform independent. It has 
been tested on Mac OS X and Windows. 

• Programming language: Java 

• Other requirements: Java 1.5 or higher, XMLBeans 
2.4.0, ISO Schematron 

• License: Apache License, Version 2.0 

• Any restrictions to use by non-academics: Redistri- 
bution requires compliance with the Apache License, 
Version 2.0. 

The project homepage provides links to project 
resources (specification, source code, documentation, 
and examples) of this project. Source code for the var- 
ious components is available via SVN at https://ncisvn. 
nci.nih.gov/ svn/mim. 

Additional material 



Additional file 1: Formal MIM Notation Specification Formal MIM 
specification documenting MIM glyphs and their usage. 



Abbreviations 

API: Application Programming Interface; ATP: Adenosine triphosphate; ADP: 
Adenosine diphosphate; BioPAX: Biological Pathway Exchange Language; 
CaMK: Ca2+/calmodulin-dependent protein kinase; CellML: Cell Markup 
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Language; DOM: Document Object Model; DNA: Deoxyribonucleic acid; ER: 
Entity-Relationship; GenMAPP: Gene Map Annotator and Pathway Profiler; 
GPML: GenMAPP Pathway Markup Language; HGNC: HUGO Gene 
Nomenclature Committee; HUGO: Human Genome Organisation; MIM: 
Molecular Interaction Map; MIMML: Molecular Interaction Map Markup 
Language; RNA: Ribonucleic acid; SBGN: Systems Biology Graphical Notation; 
SBGN-ML: Systems Biology Graphical Notation Markup Language; SBML: 
Systems Biology Markup Language; SPE: Simple Physical Entity; SVG: Scalable 
Vector Graphics; SVN: Subversion; SVRL: Schematron Validation Report 
Language; URL: Uniform Resource Locator; VANTED: Visualization and 
Analysis of Networks containing Experimental Data; XML: Extensible Markup 
Language; XSLT: Extensible Stylesheet Language Transformations 
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